As the global demand for sustainable energy grows, wind energy has emerged as one of the most mature and widely adopted renewable technologies. However, maintaining wind turbines is both complex and costly, especially when failures go undetected until critical damage occurs. ReneWind, a company specializing in wind energy systems, faces the challenge of predicting generator failures using sensor data collected from turbine components and environmental conditions. The data is ciphered for confidentiality and includes 40 predictors across thousands of observations. The key issue is to accurately identify potential failures before they occur to minimize costly replacements and optimize maintenance operations.
The objective is to develop and evaluate machine learning classification models that can predict generator failures in wind turbines using sensor-derived features. The models should prioritize minimizing false negatives (missed failures) due to their high replacement cost, while balancing false positives (unnecessary inspections) and true positives (repairs). The final model will be selected based on its predictive performance and its ability to reduce overall maintenance costs when applied to unseen test data.
pip install tensorflow.keras
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Library to encode the variables
from sklearn.preprocessing import OneHotEncoder
#Library to scale the data
from sklearn.preprocessing import RobustScaler
# Library to split data
from sklearn.model_selection import train_test_split
# library to import different optimizers
from tensorflow.keras import optimizers
# Library to import different loss functions
from tensorflow.keras import losses
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.models import Sequential
from tensorflow.keras.metrics import Precision, Recall
# Library to avoid the warnings
import warnings
warnings.filterwarnings('ignore')
# importing keras library
from tensorflow import keras
# library to convert the target variables to numpy arrays
from tensorflow.keras.utils import to_categorical
# library to plot classification report
from sklearn.metrics import classification_report
# library to import Batch Normalization
from tensorflow.keras.layers import BatchNormalization
# Library to import Dropout
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import Adam
from imblearn.over_sampling import SMOTE
# Load the training data
df = pd.read_csv('Train.csv')
df1 = pd.read_csv('Test.csv')
print(df.columns.tolist())
print(df1.columns.tolist())
df.head()
# 1. Dataset dimensions
print("Shape of training data:", df.shape)
# 2. Column names
print("\nColumn names:")
print(df.columns.tolist())
# 3. Data types and non-null counts
print("\nData types and missing values:")
print(df.info())
# 4. Summary statistics for numeric features
print("\nSummary statistics:")
print(df.describe())
# 5. Check for missing values
# Count missing values per column
missing_counts = df.isnull().sum()
# Filter only columns with missing values
missing_counts = missing_counts[missing_counts > 0]
# Display results
print("Missing values per column:")
print(missing_counts if not missing_counts.empty else "No missing values found.")
Shape : - 20,000 rows & 41 columns¶
40 sensor based features and 1 target variable¶
Target: Binary classification¶
- 0 = No failure , 1 = Failure # Missing Values:
- Only V1 and V2 have missing values (18 each)
- That's just 0.09% of the dataset — can be safely imputed or dropped # Data Types:
- All features are float64, target is int64
- Ready for numerical modeling (no encoding needed)
# Target variable distribution
print(df['Target'].value_counts())
print(df['Target'].value_counts(normalize=True))
#Univariate Analysis
sns.set(style="whitegrid")
features = [col for col in df.columns if col != 'Target']
# Histograms for all features
for col in features:
plt.figure(figsize=(6, 3))
sns.histplot(df[col], kde=True, bins=30, color='skyblue')
plt.title(f'Distribution of {col}')
plt.xlabel(col)
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
Right-Skewed Features:¶
- Values cluster low with a long tail to the right
- V3, V32, V33, V35, V36, V39
Left-Skewed Features:¶
- Values cluster high with a long tail to the left:
- V1, V2, V6, V7, V8, V34, V38, V40
Symmetric or Near-Normal Features:¶
- Centered around zero with balanced spread
- V4, V5, V9, V10, V11, V13, V15 to V31, V37
features = [col for col in df.columns if col != 'Target']
sns.set(style="whitegrid")
for col in features:
plt.figure(figsize=(6, 3))
sns.kdeplot(data=df[df['Target'] == 0], x=col, label='No Failure (0)', shade=True, color='green')
sns.kdeplot(data=df[df['Target'] == 1], x=col, label='Failure (1)', shade=True, color='red')
plt.title(f'Distribution of {col} by Target Class')
plt.xlabel(col)
plt.ylabel('Density')
plt.legend()
plt.tight_layout()
plt.show()
Strong Separation¶
These features show clear distribution differences between Target = 0 and Target = 1. They are highly predictive and should be prioritized in modeling.¶
- V3
- V12
- V14
- V35
- V38
Why they matter:¶
- Their values shift noticeably between failure and non-failure cases.
- They may show different peaks, spreads, or outlier patterns for each class
Moderate Separation¶
These features show some difference between the two target classes, but not sharply. They may still help the model when combined with others.¶
- V1, V2, V4, V6, V8
- V18, V32, V33, V34, V36, V39, V40
Why they matter:¶
- They may show wider spread or subtle shifts in mean/median between classes.
- Useful in nonlinear models like neural networks.
Weak Separation¶
These features show similar distributions for both Target = 0 and Target = 1. They are less informative and may be dropped or deprioritized.¶
- V5, V7, V9, V10, V11, V13
- V15 to V31
- V37
Why they matter:¶
- Their distributions overlap heavily between classes.
- They may add noise or redundancy unless combined cleverly.
# Boxplots: feature vs Target
for col in features:
plt.figure(figsize=(6, 3))
sns.boxplot(x='Target', y=col, data=df, palette='Set2')
plt.title(f'{col} by Failure Status')
plt.xlabel('Target (0 = No Failure, 1 = Failure)')
plt.ylabel(col)
plt.tight_layout()
plt.show()
Strong Target Separation¶
These features show clear shifts in median or spread between failure and non-failure cases:
- V3: Higher median for failures; wide spread with visible outliers → strong predictor
- V12: Lower median for failures; compact spread for Target = 0, wider for Target = 1
- V14: Similar to V12; failures show more variability
- V35: Failures have higher values and more outliers → strong signal
- V38: Failures skew lower; clear separation in box height Outliers: Present in both classes, but more frequent and extreme in Target = 1
Moderate Separation¶
These features show some difference, often in spread or outlier behavior:
- V1, V2: Slight shift in median; failures show more spread
- V4, V6, V8: Wider boxes for failures; some outlier clusters
- V32, V33, V34, V36, V39, V40: Failures show more variability and outliers Outliers: Often concentrated in the failure class, suggesting sensor spikes or anomalies
Weak Separation¶
These features show similar box plots for both classes — little to no difference in median or spread:
- V5, V7, V9-V11, V13, V15-V31, V37 Outliers: Present but evenly distributed across both classes; may reflect noise or low signal
Outlier Behavior Summary¶
- Frequent outliers: V3, V35, V38, V32, V36
- Failure-specific outliers: Often seen in Target = 1 for V12, V14, V33, V34
- Symmetric outliers: Found in V5, V10, V20 — likely noise
# correlation matrix
plt.figure(figsize=(12, 10))
corr = df[features].corr()
sns.heatmap(corr, cmap='coolwarm', center=0, square=True, cbar_kws={'shrink': 0.5})
plt.title('Feature Correlation Matrix')
plt.show()
Strong Positive Correlations (r > 0.75)¶
These feature pairs move together — when one increases, the other tends to increase too:
- V12 and V14
- V35 and V36
- V3 and V38 -Implication: These may be redundant. You can consider dropping one or combining them via PCA or feature averaging.
Strong Negative Correlations (r < -0.5)¶
These features move in opposite directions:
- V6 and V40
- V8 and V34
- Implication: These may capture complementary behaviors — useful for modeling nonlinear interactions.
Weak or No Correlation (r ≈ 0)¶
Many features show low correlation with others:
- V22 to V31 are mostly uncorrelated with the rest Implication: These may add unique signal or noise. Evaluate their predictive power individually.
#correlation with target
cor_target = df.corr()['Target'].drop('Target').sort_values(ascending=False)
print("Top positively correlated features with failure:")
print(cor_target.head(10))
print("\nTop negatively correlated features with failure:")
print(cor_target.tail(10))
#Split features and target on training data
X_train = df.drop(columns=['Target'])
y_train = df['Target']
# Impute missing values in V1 and V2 using training medians
for col in ['V1', 'V2']:
median = X_train[col].median()
X_train[col].fillna(median, inplace=True)
# handling outlires in training data
def winsorize(df, lower=0.01, upper=0.99):
df_clipped = df.copy()
for col in df.columns:
low = np.percentile(df[col], lower * 100)
high = np.percentile(df[col], upper * 100)
df_clipped[col] = np.clip(df[col], low, high)
return df_clipped
X_train = winsorize(X_train)
#feature scaling on training data
scaler = RobustScaler()
X_train_scaled = scaler.fit_transform(X_train)
#target column balancing using smote on training data
smote = SMOTE(random_state=42)
X_train_bal, y_train_bal = smote.fit_resample(X_train_scaled, y_train)
# Copy test data
X_test = df1.copy()
# Use the same medians learned from testing data
#missing values treatment on testing data
for col in ['V1', 'V2']:
X_test[col].fillna(df[col].median(), inplace=True)
# Step 1: Drop 'Target' if present in df1
X_test = df1.drop(columns=['Target'], errors='ignore') # Safe even if 'Target' is missing
# Step 2: Apply missing value treatment
for col in ['V1', 'V2']:
X_test[col].fillna(X_train[col].median(), inplace=True)
# Step 3: Apply winsorization using training thresholds
X_test = winsorize_with_reference(X_test, reference=X_train)
# Step 4: Apply scaling using training scaler
X_test_scaled = scaler.transform(X_test)
Metric of Choice: F1 Score¶
Why F1 Score?
- Dataset is imbalanced (only ~5.5% failures)
- Accuracy would be misleading — it could be high even if the model misses all failures
- F1 Score balances precision and recall, making it ideal for rare event detection like turbine failures
# Step 1: Define model
model = Sequential([
Dense(64, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dropout(0.3),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid') # Binary classification
])
# Step 2: Compile with SGD optimizer
sgd = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=sgd,
loss='binary_crossentropy',
metrics=['accuracy', Precision(), Recall()])
# Step 3: Train the model
history = model.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2, verbose=1)
Training Metrics (Epoch 50)¶
- Accuracy: 96.98%
- Loss: 0.0944
- Precision: 98.57%
- Recall: 93.44% # Validation Metrics (Epoch 50)
- Accuracy: 93.45%
- Loss: 0.1292
- Precision: 100.00%
- Recall: 93.45%
- High precision and recall on both training and validation sets confirm that the model is:
- Confident in its predictions (precision)
- Sensitive to actual failures (recall)
- Validation metrics closely track training metrics, suggesting:
- No overfitting
- Strong generalization
- Loss is low and stable, indicating convergence
# Extract target from df1
y_test = df1['Target']
# Drop target column to get test features
X_test = df1.drop(columns=['Target'])
# Predict on scaled test data
y_pred_prob = model.predict(X_test_scaled)
y_pred = (y_pred_prob > 0.5).astype(int)
# Classification report
from sklearn.metrics import f1_score, confusion_matrix, classification_report
print(classification_report(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
model_1 = Sequential([
Dense(64, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid')
])
model_1.compile(optimizer=SGD(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
model_1.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2)
y_pred_1 = (model_1.predict(X_test) > 0.5).astype(int)
print("\n📊 Model 1: Baseline (SGD, 2 layers)")
print(classification_report(y_test, y_pred_1))
print("F1 Score:", f1_score(y_test, y_pred_1))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_1))
#Deep Network (SGD,4 layers)
model_2 = Sequential([
Dense(128, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dense(64, activation='relu'),
Dense(32, activation='relu'),
Dense(16, activation='relu'),
Dense(1, activation='sigmoid')
])
model_2.compile(optimizer=SGD(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
model_2.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2)
y_pred_2 = (model_2.predict(X_test) > 0.5).astype(int)
print("\n📊 Model 2: Deeper Network (SGD, 4 layers)")
print(classification_report(y_test, y_pred_2))
print("F1 Score:", f1_score(y_test, y_pred_2))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_2))
#dropout regularization(SGD, 3 layers + dropout)
model_3 = Sequential([
Dense(64, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dropout(0.3),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
model_3.compile(optimizer=SGD(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
model_3.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2)
y_pred_3 = (model_3.predict(X_test) > 0.5).astype(int)
print("\n📊 Model 3: Dropout Regularization (SGD)")
print(classification_report(y_test, y_pred_3))
print("F1 Score:", f1_score(y_test, y_pred_3))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_3))
# Adam optimizer (same as Model 3 but with Adam)
model_4 = Sequential([
Dense(64, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dropout(0.3),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
model_4.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
model_4.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2)
y_pred_4 = (model_4.predict(X_test) > 0.5).astype(int)
print("\n📊 Model 4: Dropout + Adam")
print(classification_report(y_test, y_pred_4))
print("F1 Score:", f1_score(y_test, y_pred_4))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_4))
from sklearn.utils.class_weight import compute_class_weight
import numpy as np
# class weights (SGD + dropout + class weights)
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_train), y=y_train)
weights_dict = dict(zip(np.unique(y_train), class_weights))
model_5 = Sequential([
Dense(64, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dropout(0.3),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
model_5.compile(optimizer=SGD(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
model_5.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2, class_weight=weights_dict)
y_pred_5 = (model_5.predict(X_test) > 0.5).astype(int)
print("\n📊 Model 5: Dropout + Class Weights (SGD)")
print(classification_report(y_test, y_pred_5))
print("F1 Score:", f1_score(y_test, y_pred_5))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_5))
# Adam+Deep+Class Weights+dropout
model_6 = Sequential([
Dense(128, activation='relu', input_shape=(X_train_bal.shape[1],)),
Dropout(0.3),
Dense(64, activation='relu'),
Dropout(0.2),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid')
])
model_6.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
model_6.fit(X_train_bal, y_train_bal, epochs=50, batch_size=32, validation_split=0.2, class_weight=weights_dict)
y_pred_6 = (model_6.predict(X_test) > 0.5).astype(int)
print("\n📊 Model 6: Deep + Dropout + Class Weights (Adam)")
print(classification_report(y_test, y_pred_6))
print("F1 Score:", f1_score(y_test, y_pred_6))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_6))
Model-by-Model Commentary¶
Model 1: Baseline (SGD, 2 layers)¶
- Weak F1 score (0.48) due to very low precision (0.35)
- Overpredicts failures → high false positives
- Not suitable for deployment
Model 2: Deeper Network (SGD, 4 layers)¶
- High precision (0.88) but recall drops to 0.62
- Misses many actual failures
- Better than baseline, but not optimal for recall-sensitive tasks
Model 3: Dropout Regularization (SGD)¶
- Best F1 score (0.83) among all models
- Excellent precision (0.94) and strong recall (0.74)
- Low false positives and moderate false negatives
- Best balance between catching failures and avoiding false alarms
Model 4: Dropout + Adam¶
- Similar to Model 3 but lower recall (0.62) → more missed failures
- Slightly lower F1 (0.75) despite using Adam
Model 5: Dropout + Class Weights (SGD)¶
- Recall is decent (0.77) but precision collapses (0.36)
- F1 score (0.49) is poor due to high false positives
- Not reliable
Model 6: Deep + Dropout + Class Weights (Adam)¶
- Balanced precision (0.75) and recall (0.70)
- F1 score (0.72) is decent, but not as strong as Model 3
- Slightly more complex architecture without added benefit
Best Model: Model 3 — Dropout Regularization (SGD)¶
Justification:¶
- Highest F1 score (0.83) for the minority class (failures)
- Excellent precision (0.94) → very few false alarms
- Strong recall (0.74) → catches most failures
- Simple architecture with dropout regularization for generalization
- Outperforms deeper or more complex models (like Model 6) in both effectiveness and efficiency.